NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Self-Enhancing Video Data Management System for Compositional Events with Large Language Models

https://doi.org/10.1145/3725352

Zhang, Enhao; Sullivan, Nicole; Haynes, Brandon; Krishna, Ranjay; Balazinska, Magdalena (June 2025, Proceedings of the ACM on Management of Data)

Complex video queries can be answered by decomposing them into modular subtasks. However, existing video data management systems assume the existence of predefined modules for each subtask. We introduce VOCAL-UDF, a novel self-enhancing system that supports compositional queries over videos without the need for predefined modules. VOCAL-UDF automatically identifies and constructs missing modules and encapsulates them as user-defined functions (UDFs), thus expanding its querying capabilities. To achieve this, we formulate a unified UDF model that leverages large language models (LLMs) to aid in new UDF generation. VOCAL UDF handles a wide range of concepts by supporting both program-based UDFs (i.e., Python functions generated by LLMs) and distilled-model UDFs (lightweight vision models distilled from strong pretrained models). To resolve the inherent ambiguity in user intent, VOCAL-UDF generates multiple candidate UDFs and uses active learning to efficiently select the best one. With the self-enhancing capability, VOCAL-UDF significantly improves query performance across three video datasets.
more » « less
Full Text Available
Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows

https://doi.org/10.1145/3716134

Grunde-McLaughlin, Madeleine; Lam, Michelle S; Krishna, Ranjay; Weld, Daniel S; Heer, Jeffrey (June 2025, ACM Transactions on Computer-Human Interaction)

LLM chains enable complex tasks by decomposing work into a sequence of subtasks. Similarly, the more established techniques of crowdsourcing workflows decompose complex tasks into smaller tasks for human crowdworkers. Chains address LLM errors analogously to the way crowdsourcing workflows address human error. To characterize opportunities for LLM chaining, we survey 107 papers across the crowdsourcing and chaining literature to construct a design space for chain development. The design space covers a designer’sobjectivesand thetacticsused to build workflows. We then surfacestrategiesthat mediate how workflows use tactics to achieve objectives. To explore how techniques from crowdsourcing may apply to chaining, we adapt crowdsourcing workflows to implement LLM chains across three case studies: creating a taxonomy, shortening text, and writing a short story. From the design space and our case studies, we identify takeaways for effective chain design and raise implications for future research and development.
more » « less
Full Text Available
VIDEOSHOP: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion

https://doi.org/10.1007/978-3-031-73254-6_14

Fan, Xiang; Bhattad, Anand; Krishna, Ranjay (November 2024, Springer Nature Switzerland)

Full Text Available
Spurious Rewards: Rethinking Training Signals in RLVR

Shao, Rulin; Li, Shuyue_Stella; Xin, Rui; Geng, Scott; Wang, Yiping; Oh, Sewoong; Du, Simon_Shaolei; Lambert, Nathan; Min, Sewon; Krishna, Ranjay; et al (June 2025, cs.AI)

Full Text Available
VOCALExplore: Pay-as-You-Go Video Data Exploration and Model Building

https://doi.org/10.14778/3625054.3625057

Daum, Maureen; Zhang, Enhao; He, Dong; Mussmann, Stephen; Haynes, Brandon; Krishna, Ranjay; Balazinska, Magdalena (September 2023, Proceedings of the VLDB Endowment)

We introduce VOCALExplore, a system designed to support users in building domain-specific models over video datasets. VOCALExplore supports interactive labeling sessions and trains models using user-supplied labels. VOCALExplore maximizes model quality by automatically deciding how to select samples based on observed skew in the collected labels. It also selects the optimal video representations to use when training models by casting feature selection as a rising bandit problem. Finally, VOCALExplore implements optimizations to achieve low latency without sacrificing model performance. We demonstrate that VOCALExplore achieves close to the best possible model quality given candidate acquisition functions and feature extractors, and it does so with low visible latency (~1 second per iteration) and no expensive preprocessing.
more » « less
Full Text Available
EQUI-VOCAL: Synthesizing Queries for Compositional Video Events from Limited User Interactions

https://doi.org/10.14778/3611479.3611482

Zhang, Enhao; Daum, Maureen; He, Dong; Haynes, Brandon; Krishna, Ranjay; Balazinska, Magdalena (July 2023, Proceedings of the VLDB Endowment)

We introduce EQUI-VOCAL: a new system that automatically synthesizes queries over videos from limited user interactions. The user only provides a handful of positive and negative examples of what they are looking for. EQUI-VOCAL utilizes these initial examples and additional ones collected through active learning to efficiently synthesize complex user queries. Our approach enables users to find events without database expertise, with limited labeling effort, and without declarative specifications or sketches. Core to EQUI-VOCAL's design is the use of spatio-temporal scene graphs in its data model and query language and a novel query synthesis approach that works on large and noisy video data. Our system outperforms two baseline systems---in terms of F1 score, synthesis time, and robustness to noise---and can flexibly synthesize complex queries that the baselines do not support.
more » « less
Full Text Available
EQUI-VOCAL Demonstration: Synthesizing Video Queries from User Interactions

Zhang, Enhao; Daum, Maureen; He, Dong; Ganti, Manasi; Haynes, Brandon; Krishna, Ranjay; Balazinska, Magdalena (August 2023, Proceedings of the VLDB Endowment)

We demonstrate EQUI-VOCAL, a system that synthesizes compositional queries over videos from user feedback. EQUI-VOCAL enables users to query a video database for complex events by providing a few positive and negative examples of what they are looking for and labeling a small number of additional system-selected examples. Using those user inputs, EQUI-VOCAL synthesizes declarative queries that can then retrieve additional instances of the desired events. The demonstration makes two contributions: it introduces EQUI-VOCAL’s graphical user interface and enables conference attendees to experiment with EQUI-VOCAL on a variety of queries. Both enable users to gain a better understanding of EQUI-VOCAL’s query synthesis approach and to explore the impact of hyperparameters and label noise on system performance.
more » « less
Full Text Available
Socially situated artificial intelligence enables learning from human interaction

https://doi.org/10.1073/pnas.2115730119

Krishna, Ranjay; Lee, Donsuk; Fei-Fei, Li; Bernstein, Michael S. (September 2022, Proceedings of the National Academy of Sciences)

Regardless of how much data artificial intelligence agents have available, agents will inevitably encounter previously unseen situations in real-world deployments. Reacting to novel situations by acquiring new information from other people—socially situated learning—is a core faculty of human development. Unfortunately, socially situated learning remains an open challenge for artificial intelligence agents because they must learn how to interact with people to seek out the information that they lack. In this article, we formalize the task of socially situated artificial intelligence—agents that seek out new information through social interactions with people—as a reinforcement learning problem where the agent learns to identify meaningful and informative questions via rewards observed through social interaction. We manifest our framework as an interactive agent that learns how to ask natural language questions about photos as it broadens its visual intelligence on a large photo-sharing social network. Unlike active-learning methods, which implicitly assume that humans are oracles willing to answer any question, our agent adapts its behavior based on observed norms of which questions people are or are not interested to answer. Through an 8-mo deployment where our agent interacted with 236,000 social media users, our agent improved its performance at recognizing new visual information by 112%. A controlled field experiment confirmed that our agent outperformed an active-learning baseline by 25.6%. This work advances opportunities for continuously improving artificial intelligence (AI) agents that better respect norms in open social environments.
more » « less
Full Text Available
VOCAL: Video Organization and Interactive Compositional AnaLytics

Daum, Maureen; Zhang, Enhao; He, Dong; Balazinska, Magdalena; Haynes, Brandon; Krishna, Ranjay; Craig, Apryle; Wirsing, Aaron (January 2022, 12th Annual Conference on Innovative Data Systems Research (CIDR ’22))

Current video database management systems (VDBMSs) fail to support the growing number of video datasets in diverse domains because these systems assume clean data and rely on pretrained models to detect known objects or actions. Existing systems also lack good support for compositional queries that seek events con- sisting of multiple objects with complex spatial and temporal rela- tionships. In this paper, we propose VOCAL, a vision of a VDBMS that supports efficient data cleaning, exploration and organization, and compositional queries, even when no pretrained model exists to extract semantic content. These techniques utilize optimizations to minimize the manual effort required of users.
more » « less
Full Text Available
Conceptual Metaphors Impact Perceptions of Human-AI Collaboration

https://doi.org/10.1145/3415234

Khadpe, Pranav; Krishna, Ranjay; Fei-Fei, Li; Hancock, Jeffrey T.; Bernstein, Michael S. (October 2020, Proceedings of the ACM on Human-Computer Interaction)
null (Ed.)
Full Text Available

Search for: All records